Extracting information from heterogeneous information sources using ontologically specified target views

نویسندگان

  • Joachim Biskup
  • David W. Embley
چکیده

People want to know! And so do government agencies, information providers, search-and-retrieval companies, electronic publishers, corporate enterprises, and business professionals. But they’re swamped with volumes of structured and unstructured data spewed forth from databases, data warehouses, search engines, corporate intranets, news feeds, and the increasing global Internet. They want critical information extracted and integrated in a personalized view, all automatically at the level of a human expert. In conjunction with collective efforts of data and knowledge workers, we attack this problem head on by offering, in this paper, a framework for addressing these problems. In our proposed framework we assume that a target view is specified ontologically and independently of any of the sources, and we model both the target and all the sources in the same modeling language. Then, for a given target and source we generate a target-to-source mapping, that has the necessary properties to enable us to load target facts from source facts. The mapping generator raises specific issues for a user’s consideration, but is endowed with defaults to allow it to run to completion with or without user input. In addition to a resulting target-to-source mapping, the mapping generator records alternative possibilities in a table, which thus holds the answers to what possibilities were considered and why the selected possibilities were chosen, and which also provides for confidence factors to measure the confidence in the mapping selected. The framework is based on a formal foundation, and we are able to prove that when a source has a valid interpretation, the part of the target loaded from the source according to a generated source-to-target mapping also has a valid interpretation. Given individual target-to-source mappings for several sources, the framework ∗Research for this paper was done during a sabbatical leave from Brigham Young University, Provo, Utah 84602, U.S.A.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Information from Heterogeneous Information Sources Using Ontologically Speciied Target Views

People want to know! And so do government agencies, information providers, search-and-retrieval companies, electronic publishers, corporate enterprises, and business-intelligence professionals. But they're swamped with volumes of structured and un-structured data spewed forth from databases, data warehouses, search engines, corporate intranets, news feeds, and the increasing global Internet. Th...

متن کامل

What is information? with an Emphasis on Zins’s Views

Purpose: The aim of this paper is to explore some important views and theories of information and knowledge in the area of library and information science (LIS) and then to propose a new model of information and knowledge. Method: This research has been conducted by using a comparative content analysis of the existing materials on conceptualizations and definitions of information and knowledge...

متن کامل

Automatically Extracting Ontologically Specified Data from HTML Tables of Unknown Structure

Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem based on document-independent extraction ontologies. The solution entails elements of table understanding, data integration, and wrapper creation. Table understanding allows us to recognize attributes...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Assessing the information literacy of Farhangian University student-teachers from six perspectives: reviewing, recognizing sources, disseminating, recognizing, flexibility and seeking information

Background: Today, the scope of science and knowledge and, consequently, its measurement has become very common. The use of this information in the university environment depends on the students' knowledge of the places where the information is disseminated and how it is used and the methods of retrieving and using that information. The aim of this study was to investigate the information liter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Syst.

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2003